library(openxlsx)
excel_file<-"statistic_id281744_gdp-of-the-uk-1948-2023.xlsx"
Data<- "Data"
UK_GDP.df= readxl::read_excel("statistic_id281744_gdp-of-the-uk-1948-2023.xlsx")
View(UK_GDP.df)
UK_GDP.df$Years
## [1] "1948" "1949" "1950" "1951" "1952" "1953" "1954" "1955" "1956" "1957"
## [11] "1958" "1959" "1960" "1961" "1962" "1963" "1964" "1965" "1966" "1967"
## [21] "1968" "1969" "1970" "1971" "1972" "1973" "1974" "1975" "1976" "1977"
## [31] "1978" "1979" "1980" "1981" "1982" "1983" "1984" "1985" "1986" "1987"
## [41] "1988" "1989" "1990" "1991" "1992" "1993" "1994" "1995" "1996" "1997"
## [51] "1998" "1999" "2000" "2001" "2002" "2003" "2004" "2005" "2006" "2007"
## [61] "2008" "2009" "2010" "2011" "2012" "2013" "2014" "2015" "2016" "2017"
## [71] "2018" "2019" "2020" "2021" "2022" "2023"
UK_GDP.df$`GDPinMillions`
## [1] 381461 394097 407268 422401 428734 452548 472254 491243 499555
## [10] 509379 516239 538278 572313 587589 593887 622566 658062 671975
## [19] 682419 701215 739468 753606 773977 801969 836938 891381 869155
## [28] 856049 881878 903587 941461 976566 956533 949961 968629 1008981
## [37] 1031385 1073547 1106912 1166905 1229650 1259191 1266248 1248461 1251525
## [46] 1279802 1323443 1355023 1390010 1458467 1508263 1554509 1621644 1663462
## [55] 1693271 1746551 1788931 1837927 1881785 1931094 1926735 1837817 1878960
## [64] 1900476 1929229 1963807 2026566 2071561 2111357 2167415 2197841 2233921
## [73] 2002489 2176203 2270764 2274050
UK_GDP.df<-data.frame(ds=UK_GDP.df$Years, y=UK_GDP.df$GDPinMillions)
UK_GDP_ts <- ts(UK_GDP.df$y, start = min(UK_GDP.df$ds), frequency = 1)
UK_GDP_ts
## Time Series:
## Start = 1948
## End = 2023
## Frequency = 1
## [1] 381461 394097 407268 422401 428734 452548 472254 491243 499555
## [10] 509379 516239 538278 572313 587589 593887 622566 658062 671975
## [19] 682419 701215 739468 753606 773977 801969 836938 891381 869155
## [28] 856049 881878 903587 941461 976566 956533 949961 968629 1008981
## [37] 1031385 1073547 1106912 1166905 1229650 1259191 1266248 1248461 1251525
## [46] 1279802 1323443 1355023 1390010 1458467 1508263 1554509 1621644 1663462
## [55] 1693271 1746551 1788931 1837927 1881785 1931094 1926735 1837817 1878960
## [64] 1900476 1929229 1963807 2026566 2071561 2111357 2167415 2197841 2233921
## [73] 2002489 2176203 2270764 2274050
class(UK_GDP_ts)
## [1] "ts"
library(prophet)
## Loading required package: Rcpp
## Loading required package: rlang
library(rlang)
library(Rcpp)
library(prophet)
UK_GDP.df$ds<-as.Date(as.character(UK_GDP.df$ds), format = "%Y")
m <- prophet(UK_GDP.df,
changepoint.prior.scale = 0.05, n.changepoints = 17)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
# In this case, we disable the 'daily' and 'weekly' seasonality, as my data period is set 'yearly'.
f = prophet::make_future_dataframe(m,periods = 8, freq = "year")
Future_Years <- f
Predicted_Forecast <- predict(m, Future_Years)
plot(m,Predicted_Forecast)
# The graph displayed above plots the trend 'm' against the future 'Predicted_forecast' of my data for the next 8 years. We can clearly see that the future Predicted_forecast predicts that the trend will continue to stay fairly consistent. The graph suggests that by 2031, the UK's GDP will reach approximately 2,500,00 (£million).
prophet_plot_components(m, Predicted_Forecast)
# From observing my trend and seasonality in the two graphs above, we analyse that the 'trend' the UK's GDP from 1948-2023 remains positively correlated, as GDP has a directly proportionally increase with time (in years).
# However, interestingly we also observe that this is not the case for the yearly seasonality of the UK's GDP. Between January-June, there is high seasonality, indicating that there is larger fluctuation of GDP in these months compared to the rest of the year.We can observe from the graph two significant dips in March and April of each year.
# One assumption that could be made is that these troughs in March ad April each year correspond with the fact that in the UK the fiscal year operates from 1st April- 31st March.
dyplot.prophet(m, Predicted_Forecast)
## Warning: `select_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `select()` instead.
## ℹ The deprecated feature was likely used in the prophet package.
## Please report the issue at <https://github.com/facebook/prophet/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# This second graph above displays an interactive version of the Trend and Seasonality of my data.
Regression_model<-lm(y~ds, UK_GDP.df)
summary(Regression_model)
##
## Call:
## lm(formula = y ~ ds, data = UK_GDP.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -153895 -69772 4888 70261 158536
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.995e+05 1.235e+04 64.73 <2e-16 ***
## ds 7.244e+01 1.253e+00 57.80 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 87550 on 74 degrees of freedom
## Multiple R-squared: 0.9783, Adjusted R-squared: 0.978
## F-statistic: 3341 on 1 and 74 DF, p-value: < 2.2e-16
# My linear regression model displays a strong positively correlated relationship between years and GDP, indicating that GDP tends to increase by approximately £72.44 million each year. My model shows that Adjusted R-squared = 97.8% (variance) in GDP, suggesting that my model is a good fit for the data.
# Both the intercept and 'Year' coefficients are highly significant (p<0.001), highlighting the stability of the relationship between them. The residual standard error is approximately £87,550 million which represents the average deviation of observed GDP values from the predicted GDP values.
# Conclusively,the regression model provides fairly reliable and valuable results.
plot<-plot(UK_GDP.df$ds,UK_GDP.df$y,xlab = "Year",ylab = "GDP in Millions")
# In the graph presented above, we can analyse that generally, as the years have gone on, GDP has increased alongside it.
# However, if we observe the graph in more detail, we can see that there has been sharp bursts of GDP growth around 1976, 1980, 1990 and 2009.
# An assumption that could be highlighted from the sharp growth in 2009, may had been influenced due to the previous 2007/2008 financial crisis having an effect on the UK's GDP. This consequently then resulted in a large growth in 2009 following the recession from the previous year.
# Additionally, one can observe that there are two anomalies that lie in the years 2020 and 2021.
# A fair assumption that could be made is that these lower GDP values in 2020 and 2021 were impacted due to COVID-19.
plot(fitted(Regression_model),rstandard(Regression_model),xlab= "Fitted Values", ylab= "Standardised Residuals", main= "Residuals vs Fitted", type= "l")
co2.df = data.frame( ds=zoo::as.yearmon(time(co2)), y=co2)
m2<- prophet::prophet(co2.df)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
# Again, we disable the 'daily' and 'weekly' seasonality, because the CO2 example has a time period set to 'yearly'.
f2 = prophet::make_future_dataframe(m2, periods=8, freq="quarter")
p = predict(m2, f2)
plot(m2,p)
# In the graph displayed above, we can comment on the comparison between the data in my project and the Co2 data example. Both my graph and the graph in the CO2 data display a positive relationship between the x and y variables. However, the CO2 example measure time 'monthly' whereas my data expresses time in 'years'.
# In both data sets '8 periods' have been predicted into the future, but for my data 8 periods= 8 years, but for the CO2 example 8 period= 2 years.
prophet_plot_components(m2, p)
# Both trend graphs from my dataset and the CO2 dataset are extremely similar. As CO2 gas increases over time, so does the GDP in the UK.
# Opposingly, the graph of seasonality for the CO2 over the years, visually appears very different to my dataset on GDP over the years. In the CO2 example there are many more and larger fluctuations of data spread over time. With the largest dip being in September of each year.
dyplot.prophet(m2,p)
# Displayed above is an interactive version of the Trend and Seasonality from the CO2 example dataset.
Regression_model2<-lm(y~ds, co2.df)
summary(Regression_model2)
##
## Call:
## lm(formula = y ~ ds, data = co2.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0399 -1.9476 -0.0017 1.9113 6.5149
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.250e+03 2.127e+01 -105.8 <2e-16 ***
## ds 1.308e+00 1.075e-02 121.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.618 on 466 degrees of freedom
## Multiple R-squared: 0.9695, Adjusted R-squared: 0.9694
## F-statistic: 1.479e+04 on 1 and 466 DF, p-value: < 2.2e-16
# The CO2 example has a slightly lower R-squared value (96.5%) compared to my GDP data (97.8%), indicating that the C02 model produces slightly more variance in their data.
# Both models show high significant coefficients for the intercept and the 'year' terms, which implies that both models have robust relationships between the time (years) and their respective variables (CO2 levels for the CO2 example and GDP for my dataset).
plot<-plot(co2.df$ds,co2.df$y,xlab = "Year",ylab = "PartsperMillion")
# The CO2 regression model graph conveys a slightly more positive linear relationship between its x and y variables than my data set. It should also be noted that there are more points plotted on this graph, which highlights that the CO2 example contains more observed data points compared my data.
plot(fitted(Regression_model2),rstandard(Regression_model2),xlab= "Fitted Values", ylab= "Standardised Residuals", main= "Residuals vs Fitted", type= "l")
plot(decompose(co2))
# Here, we see the decomposition of the CO2 example as a time series. Notice how 'Random' (which is the Noise), is just Noise= Observed Data- Trend- Seasonality in mathematical terms.
# As we conclude this report, it can be said that both the CO2 example and my project data have strong similarities as well as differences.
# I propose that my project has more reliable data. This is due to the fact that the C02 example only observes data up to the year 2000. Whereas, my dataset observes data until the year 2023.
# The more recent the data, the better predictions that can be made for a time series, and thus producing more accurate and reliable results. Moreover, since my data set contains a smaller number of observed points, it is easier and clearer to see on graphs.
# Although, it should be made known that these two data sets are completely independent and uncorrelated to one another. So we are only able to compare these data sets to a certain extent, as the responsive variables are not the same and use completely different units (£millions for GDP and C02 levels in %).